Predicting Interest: Another Use for Latent Semantic Analysis
نویسندگان
چکیده
Latent Semantic Analysis (LSA) is a statistical technique for extracting semantic information from text corpora. LSA has been used with success to automatically grade student essays (Intelligent Essay Scoring), model human language learning, and model language comprehension. We examine how LSA may help to predict a reader’s interest in a selection of news articles, based on their reported interest for other articles. The initial results are encouraging. LSA (using default corpus and setup) can closely match human preferences, with RMSE values as low as 2.09 (human ratings being on a scale of 1-10). Additionally, an Adapting Measure (best parameters for each individual) produced significantly better results, RMSE = 1.79.
منابع مشابه
Query expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملPredicting Word Clipping with Latent Semantic Analysis
In this paper, we compare a resourcedriven approach with a task-specific classification model for a new near-synonym word choice sub-task, predicting whether a full or a clipped form of a word will be used (e.g. doctor or doc) in a given context. Our results indicate that the resourcedriven approach, the use of a formality lexicon, can provide competitive performance, with the parameters of the...
متن کاملLearning semantic structures from in-domain documents
Semantic analysis is a core area of natural language understanding that has typically focused on predicting domain-independent representations. However, such representations are unable to fully realize the rich diversity of technical content prevalent in a variety of specialized domains. Taking the standard supervised approach to domainspecific semantic analysis requires expensive annotation ef...
متن کاملLearning Semantic Structures from In - domain
Semantic analysis is a core area of natural language understanding that has typically focused on predicting domain-independent representations. However, such representations are unable to fully realize the rich diversity of technical content prevalent in a variety of specialized domains. Taking the standard supervised approach to domainspecific semantic analysis requires expensive annotation ef...
متن کاملPredicting Interesting Things in Text
While reading a document, a user may encounter concepts, entities, and topics that she is interested in exploring more. We propose models of “interestingness”, which aim to predict the level of interest a user has in the various text spans in a document. We obtain naturally occurring interest signals by observing user browsing behavior in clicks from one page to another. We cast the problem of ...
متن کامل